{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1> Implementing New Methods </h1>\n",
    "\n",
    "This tutorial gives an overview of important interfaces for implementing new prediction methods or writing wrapper classes for external tools. It will also illustrate an easy example of integrating a new epitope prediction method.\n",
    "\n",
    "**Note**: If you are manipulating positional information of transcripts, proteins, peptides, or variants please note that FRED2 assumes the starting position at 0 not at 1!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Chapter 1: Abstract Base-Classes for Prediction Methods </h2>\n",
    "\n",
    "We will illustrate the different interfaces based on epitope prediction, but there exist similar interfaces for cleavage site, cleavage fragment, and TAP prediction methods.\n",
    "\n",
    "All epitope prediction methods have to implement at least a very rudimentary interface called `Fred2.Core.AEpitopePrediction`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class AEpitopePrediction(object):\n",
    "    __metaclass__ = APluginRegister\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def name(self):\n",
    "        \"\"\"\n",
    "        Returns the name of the predictor\n",
    "\n",
    "        :return: str \n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def version(cls):\n",
    "        \"\"\"\n",
    "        Returns the version of the predictor\n",
    "        \n",
    "        :return: str\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def supportedAlleles(self):\n",
    "        \"\"\"\n",
    "        Returns a list of valid allele models\n",
    "\n",
    "        :return: List of allele names for which the predictor provides models\n",
    "        \n",
    "        :return: set(str) - Iterable of supported Alleles e.g. [A*01:02, B*07:05]\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def supportedLength(self):\n",
    "        \"\"\"\n",
    "        Returns a list of supported peptide lengths\n",
    "\n",
    "        :return: set(int) - Iterable of supported peptide lengths e.g. [8, 9, 10]\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "\n",
    "    @abc.abstractmethod\n",
    "    def convert_alleles(self, alleles):\n",
    "        \"\"\"\n",
    "        Converts alleles into the internal allele representation of the predictor\n",
    "        and returns a string representation\n",
    "\n",
    "        :param list(Allele) alleles: The alleles for which the internal predictor\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3> APSSMEpitopePrediction </h3>\n",
    "The prediction methods are further separated into so called `PSSM`, `SVM`, and `External` modules. Methods in `PSSM` are fully integrated linear prediction methods, whose weight matrices can be found in `Fred2.Data`. These methods only have to inherit from `Fred2.EpitopePrediction.APSSMEpitopePrediction`. This abstract base-class already implements the prediction method. If the prediction matrices can be found in `Fred2.Data` and are in the correct format such as this:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "<name>_<locus>_<super-and-sub-digit>_<length> = \n",
    "{\n",
    "0: {'A': 0.0, 'C': 0.0, 'E': 1.09861228867, 'D': 1.09861228867, 'G': 0.0, 'F': 0.0, 'I': 0.0, 'H': 0.0, 'K': 0.0,\n",
    "    'M': 0.0, 'L': 0.0, 'N': 0.0, 'Q': 0.0, 'P': -2.30258509299, 'S': 0.0, 'R': 0.0, 'T': 0.0, 'W': 0.0, 'V': 0.0,\n",
    "    'Y': 0.0},\n",
    "1: {'A': 0.0, 'C': 0.0, 'E': -2.30258509299, 'D': -2.30258509299, 'G': 0.0, 'F': -2.30258509299, 'I': 0.0, 'H': 0.0,\n",
    "    'K': 1.09861228867, 'M': 0.0, 'L': 0.0, 'N': 0.0, 'Q': 0.0, 'P': 0.0, 'S': 0.0, 'R': 2.99573227355, 'T': 0.0,\n",
    "    'W': -2.30258509299, 'V': 0.0, 'Y': -2.30258509299},\n",
    "    ......\n",
    "8: {'A': 0.0, 'C': 0.0, 'E': -2.30258509299, 'D': -2.30258509299, 'G': -2.30258509299, 'F': 0.0, 'I': 1.38629436112,\n",
    "    'H': -2.30258509299, 'K': -2.30258509299, 'M': 1.38629436112, 'L': 2.99573227355, 'N': -1.60943791243,\n",
    "    'Q': -2.30258509299, 'P': -2.30258509299, 'S': 0.0, 'R': -2.30258509299, 'T': 0.0, 'W': 0.0, 'V': 1.38629436112,\n",
    "    'Y': 0.0}, \n",
    "\n",
    "#bias term stored at -1 \n",
    "-1: {'con': -2.99573227355}\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "only the class properties have to be implemented. If however, the prediction function is much more complicated, please still inherit from `Fred2.EpitopePrediction.APSSMEpitopePrediction` and overwrite the prediction function accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3>ASVMEpitopePrediction</h3>\n",
    "Methods that can be found in the module SVM are also fully integrated into Fred2 and their fully trained SVMs can be found in `Fred2.Data.svms`. Fred2 is using the python binding of `svmlight`. Therefore, the SVM model files have to be in svmlight-format. All Fred2 SVM classes implement the interface `Fred2.Core.ASVM` besides the basic `Fred2.Core.AEpitopePrediction`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class ASVM(object):\n",
    "    \"\"\"\n",
    "        Base class for SVM prediction tools\n",
    "    \"\"\"\n",
    "    __metaclass__ = abc.ABCMeta\n",
    "\n",
    "    @abc.abstractmethod\n",
    "    def encode(self, peptides):\n",
    "        \"\"\"\n",
    "        Returns the feature encoding for peptides\n",
    "\n",
    "        :param List(Peptide)/Peptide peptides: List or Peptide object\n",
    "        :return: list(Object) -- Feature encoding of the Peptide objects\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This interface ensures that the peptides are in the correct input format to call SVMlights prediction method. The abstract base-class `Fred2.EpitopePrediction.ASVMEpitopePrediction` implements a rudimentary implementation of the function `predict`. If all SVM files are correctly stored in `Fred2.Data` and `encode()` its implemented it suffices to inherit from `Fred2.EpitopePrediction.ASVMEpitopePrediction` and implement the defined Class properties."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3> AExternalEpitopePrediction </h3>\n",
    "Methods in this module are loosely integrated into Fred2. Fred2 is simply calling their command line tools and pre- and post-processes the in- and output of the tools. Hence the interfaces are much more involved as the ones before. All external tools have to implement `Fred2.Core.AExternal`:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class AExternal(object):\n",
    "    \"\"\"\n",
    "     Base class for external tools\n",
    "    \"\"\"\n",
    "    __metaclass__ = abc.ABCMeta\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def command(self):\n",
    "        \"\"\"\n",
    "        defines the external execution command \n",
    "        e.g. netMHC -p {peptides} -a {alleles} -x {out} {options} \n",
    "        \"\"\"\n",
    "\n",
    "    @abc.abstractmethod\n",
    "    def parse_external_result(self, _file):\n",
    "        \"\"\"\n",
    "        Parses external results and returns a AResult object\n",
    "\n",
    "        :param str _file: The file path or the external prediction results\n",
    "        :return: AResult - Returns a AResult object\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "    def is_in_path(self):\n",
    "        \"\"\"\n",
    "        checks whether the specified execution command can be found in PATH\n",
    "\n",
    "        :return: bool - Whether or not command could be found in PATH\n",
    "        \"\"\"\n",
    "        exe = self.command.split()[0]\n",
    "        for try_path in os.environ[\"PATH\"].split(os.pathsep):\n",
    "            try_path = try_path.strip('\"')\n",
    "            exe_try = os.path.join(try_path, exe).strip()\n",
    "            if os.path.isfile(exe_try) and os.access(exe_try, os.X_OK):\n",
    "                return True\n",
    "        return False\n",
    "\n",
    "    @abc.abstractmethod\n",
    "    def get_external_version(self, path=None):\n",
    "        \"\"\"\n",
    "        Returns the external version of the tool by executing\n",
    "        >{command} --version\n",
    "\n",
    "        might be dependent on the method and has to be overwritten\n",
    "        therefore it is declared abstract to enforce the user to\n",
    "        overwrite the method. The function in the base class can be called\n",
    "        with super()\n",
    "\n",
    "        :param (str) path: - optional specification of executable path if deviant from self.__command\n",
    "        :return: str - The external version of the tool\n",
    "        \"\"\"\n",
    "        exe = self.command.split()[0] if path is None else path\n",
    "        try:\n",
    "            p = subprocess.Popen(exe + ' --version', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n",
    "            p.wait() #block the rest\n",
    "            stdo, stde = p.communicate()\n",
    "            stdr = p.returncode\n",
    "            if stdr > 0:\n",
    "                raise RuntimeError(\"Could not check version of \" + exe + \" - Please check your installation and FRED2 \"\n",
    "                                                                         \"wrapper implementation.\")\n",
    "        except Exception as e:\n",
    "                raise RuntimeError(e)\n",
    "        return str(stdo).strip()\n",
    "\n",
    "    @abc.abstractmethod\n",
    "    def prepare_peptide_input(self, _peptides, _file):\n",
    "        \"\"\"\n",
    "        Prepares sequence input for external tools\n",
    "        and writes them to _file in the specific format\n",
    "\n",
    "        NO return value!\n",
    "\n",
    "        :param: (list(str)) _peptides: the peptide sequences to write into _file\n",
    "        :param (File) _file: File handler to input file for external tool\n",
    "        \"\"\"\n",
    "        return NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The binaries have to be globally executable and the internal version number has to match the version of the external tool. When specifying the command line call, please use the following placeholders `{peptides}` as sequence input, `{alleles}` as allele input, and `{out}` as output file. `{options}` can be used to allow user specific optional command line flags that will directly be passed through Fred2 to the external tool. The placeholder are later filled with the appropriate replacements via Pythons `string.format()` function.\n",
    "\n",
    "As usual `Fred2.EpitopePrediction.AExternalEpitopePrediction` combines `AEpitopePrediction` and `AExternal` and provides a implementation of `predict()` that calls all other methods to implement the complete interaction between Fred2 and the external tools. Additionally, `AExternalEpitopePrediction` extends the interface of `AEpitopePrediction.predict()`to the following:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class AExternalEpitopePrediction(AEpitopePrediction, AExternal):\n",
    "    \"\"\"\n",
    "        Abstract class representing external prediction tools.\n",
    "\n",
    "    \"\"\"\n",
    "\n",
    "    def predict(self, peptides, alleles=None, command=None, options=None, **kwargs):\n",
    "        ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The two new optional parameters `command=None` and `options=None` can be used to specify a path to an alternative binary and additional command line options respectively. The alternative binary should be of the same version as the version specified in the Fred2 implementation or at least should produce the same output format. The optional command line parameter are not tested by Fred2 and are directly handed over to the external tool.\n",
    "\n",
    "New external Methods should inherit from `AExternalEpitopePrediction` and implement the missing properties and functions accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3> Simple Example of a new Prediction Method </h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from Fred2.EpitopePrediction import APSSMEpitopePrediction\n",
    "from Fred2.Core import EpitopePredictionResult\n",
    "import random\n",
    "import pandas\n",
    "\n",
    "class RandomEpitopePrediction(APSSMEpitopePrediction):\n",
    "    __alleles = [\"A*02:01\"]\n",
    "    __supported_length = [9]\n",
    "    __name = \"random\"\n",
    "    __version= \"1.0\"\n",
    "    \n",
    "    #the interface defines three class properties\n",
    "    @property\n",
    "    def name(self):\n",
    "        #retunrs the name of the predictor\n",
    "        return self.__name\n",
    "    \n",
    "    @property\n",
    "    def supportedAlleles(self):\n",
    "        #returns the supported alleles as strings (without the HLA prefix)\n",
    "        return self.__alleles\n",
    "    \n",
    "    @property\n",
    "    def supportedLength(self):\n",
    "        #returns the supported epitope lengths as iterable\n",
    "        return self.__supported_length\n",
    "    \n",
    "    @property\n",
    "    def version(self):\n",
    "        #returns the version of the predictor\n",
    "        return self.__version\n",
    "    \n",
    "    #the interface defines a function converting Fred2's HLA allele presentation\n",
    "    #into an internal presentation used by different methods.\n",
    "    #for this predictor we won't need it but still have to provide it!\n",
    "    #the function consumes a list of alleles and converts them into the internally used presentation\n",
    "    def convert_alleles(self, alleles):\n",
    "        #we just use the identity function\n",
    "        return alleles\n",
    "    \n",
    "    #additionally the interface defines a function `predict` \n",
    "    #that consumes a list of peptides or a single peptide and optionally a list \n",
    "    #of allele objects\n",
    "    #\n",
    "    #this method implements the complete prediction routine\n",
    "    def predict(self, peptides, alleles=None):\n",
    "        \n",
    "        #test whether one peptide or a list\n",
    "        if isinstance(peptides, Peptide):\n",
    "            peptides = [peptides]\n",
    "        \n",
    "        #if no alleles are specified do predictions for all supported alleles\n",
    "        if alleles is None:\n",
    "            alleles = self.supportedAlleles\n",
    "        else:\n",
    "            #filter for supported alleles\n",
    "            alleles = filter(lambda a: a.name in self.supportedAlleles, alleles) \n",
    "        \n",
    "        result = {}\n",
    "        #now predict binding/non-binding for each peptide at random\n",
    "        for a in alleles:\n",
    "            result[a] = {}\n",
    "            for p in peptides:\n",
    "                if random.random() >= 0.5:\n",
    "                    result[a][p] = 1.0\n",
    "                else:\n",
    "                    result[a][p] = 0.0\n",
    "        \n",
    "        #create EpitopePredictionResult object. This is a multi-indexed DataFrame \n",
    "        #with Peptide and Method as multi-index and alleles as columns\n",
    "        df_result = EpitopePredictionResult.from_dict(result)\n",
    "        df_result.index = pandas.MultiIndex.from_tuples([tuple((i,self.name)) for i in df_result.index],\n",
    "                                                        names=['Seq','Method'])\n",
    "        return df_result\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets use our new predictor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from Fred2.EpitopePrediction import EpitopePredictorFactory\n",
    "from Fred2.Core import Peptide\n",
    "\n",
    "EpitopePredictorFactory(\"random\").predict(Peptide(\"SYFPEITHI\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Chapter 2: Abstract Base-Classes for HLA Typing </h2>\n",
    "\n",
    "Fred2 currently offers only an interface to integrate external HLA typing tools, as the algorithms are quite often very involved and use additional third-party tools for data pre-processing. As usual, a very basic interface called `Fred2.Core.AHLATyping`: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class AHLATyping(object):\n",
    "    __metaclass__ = APluginRegister\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def name(self):\n",
    "        \"\"\"\n",
    "        Returns the name of the predictor\n",
    "\n",
    "        :return:\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "    @abc.abstractproperty\n",
    "    def version(self):\n",
    "        \"\"\"\n",
    "        parameter specifying the version of the prediction method\n",
    "\n",
    "        \"\"\"\n",
    "        raise NotImplementedError\n",
    "\n",
    "    @abc.abstractmethod\n",
    "    def predict(self, ngsFile, output, **kwargs):\n",
    "        \"\"\"\n",
    "        Prediction method calling the HLA typing algorithm\n",
    "\n",
    "        :param str ngsFile: The path to the input file containing the NGS reads\n",
    "        :param str output: The path to the output file or directory\n",
    "        :param kwargs: optional parameters directly handed over to the algorithm without checking\n",
    "        :return: list(Allele) - A list of HLA alleles representing the genotype predicted by the algorithm\n",
    "        \"\"\"\n",
    "        raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "defines rudimentary functionality. `Fred2.HLATyping.AExternalHLATyping` combines `Fred2.Core.AHLATyping` and `Fred2.Core.AExternal` and implements `predict()` with an overwritten interface and extends the interface of `Fred2.HLATyping.AExternalHLATyping` by an additional abstract method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class AExternalHLATyping(AHLATyping, AExternal):\n",
    "\n",
    "    def predict(self, ngsFile, output, command=None, options=None, delete=True, **kwargs):\n",
    "        \"\"\"\n",
    "        Implementation of prediction\n",
    "\n",
    "        :param str ngsFile: The path to the NGS file of interest\n",
    "        :param str output: The path to the output file or directory\n",
    "        :param str command: The path to a alternative binary (if binary is not globally executable)\n",
    "        :param str options: A string with additional options that is directly past to the tool\n",
    "        :param bool delete: Boolean indicator whether generated files should be deleted afterwards\n",
    "        :return: list(Allele) - A list of Allele objects representing the most likely HLA genotype\n",
    "        \"\"\"\n",
    "\n",
    "       ...\n",
    "    \n",
    "    @abc.abstractmethod\n",
    "    def clean_up(self, _output):\n",
    "        \"\"\"\n",
    "        Cleans the generated files after prediction\n",
    "\n",
    "        :param str output: The path to the output file or directory\n",
    "        \"\"\"\n",
    "        raise NotImplementedError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to the optional inputs `command=None` and `options=None` in AExternalEpitopePrediction, these options can be used to specify the path to an alternative binary and additional command line inputs respectively."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
